12 research outputs found

    Női ÁVH-s koncepciós perek : az "ávós" nők jelenléte a kommunista igazságszolgáltatásban

    Get PDF

    Női ÁVH-s koncepciós perek

    Get PDF

    Data Augmentation for Machine Translation via Dependency Subtree Swapping

    Full text link
    We present a generic framework for data augmentation via dependency subtree swapping that is applicable to machine translation. We extract corresponding subtrees from the dependency parse trees of the source and target sentences and swap these across bisentences to create augmented samples. We perform thorough filtering based on graphbased similarities of the dependency trees and additional heuristics to ensure that extracted subtrees correspond to the same meaning. We conduct resource-constrained experiments on 4 language pairs in both directions using the IWSLT text translation datasets and the Hunglish2 corpus. The results demonstrate consistent improvements in BLEU score over our baseline models in 3 out of 4 language pairs. Our code is available on GitHub

    Data augmentation for machine translation via dependency subtree swapping

    Get PDF
    We present a generic framework for data augmentation via dependency subtree swapping that is applicable to machine translation. We extract corresponding subtrees from the dependency parse trees of the source and target sentences and swap these across bisentences to create augmented samples. We perform thorough filtering based on graphbased similarities of the dependency trees and additional heuristics to ensure that extracted subtrees correspond to the same meaning. We conduct resource-constrained experiments on 4 language pairs in both directions using the IWSLT text translation datasets and the Hunglish2 corpus. The results demonstrate consistent improvements in BLEU score over our baseline models in 3 out of 4 language pairs. Our code is available on GitHub

    HunSum-1 : an abstractive summarization dataset for Hungarian

    Get PDF
    We introduce HunSum-1 : a dataset for Hungarian abstractive summarization, consisting of 1.14M news articles. The dataset is built by collecting, cleaning and deduplicating data from 9 major Hungarian news sites through CommonCrawl. Using this dataset, we build abstractive summarizer models based on huBERT and mT5. We demonstrate the value of the created dataset by performing a quantitative and qualitative analysis on the models’ results. The HunSum-1 dataset, all models used in our experiments and our code1 are available open source

    Data Augmentation for Machine Translation via Dependency Subtree Swapping

    Get PDF

    HunSum-1: an Abstractive Summarization Dataset for Hungarian

    Get PDF

    SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages

    Get PDF
    This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku, Classical Syriac, Arabic (Modern Standard, Egyptian, Gulf), Hebrew, Amharic, Aymara, Magahi, Braj, Kurdish (Central, Northern, Southern), Polish, Karelian, Livvi, Ludic, Veps, Võro, Evenki, Xibe, Tuvan, Sakha, Turkish, Indonesian, Kodi, Seneca, Asháninka, Yanesha, Chukchi, Itelmen, Eibela. We evaluate six systems on the new data and conduct an extensive error analysis of the systems' predictions. Transformer-based models generally demonstrate superior performance on the majority of languages, achieving >90% accuracy on 65% of them. The languages on which systems yielded low accuracy are mainly under-resourced, with a limited amount of data. Most errors made by the systems are due to allomorphy, honorificity, and form variation. In addition, we observe that systems especially struggle to inflect multiword lemmas. The systems also produce misspelled forms or end up in repetitive loops (e.g., RNN-based models). Finally, we report a large drop in systems' performance on previously unseen lemmas.Peer reviewe

    Characterization of an Aerosol-Based Photobioreactor for Cultivation of Phototrophic Biofilms

    No full text
    Phototrophic biofilms, in particular terrestrial cyanobacteria, offer a variety of biotechnologically interesting products such as natural dyes, antibiotics or dietary supplements. However, phototrophic biofilms are difficult to cultivate in submerged bioreactors. A new generation of biofilm photobioreactors imitates the natural habitat resulting in higher productivity. In this work, an aerosol-based photobioreactor is presented that was characterized for the cultivation of phototrophic biofilms. Experiments and simulation of aerosol distribution showed a uniform aerosol supply to biofilms. Compared to previous prototypes, the growth of the terrestrial cyanobacterium Nostoc sp. could be almost tripled. Different surfaces for biofilm growth were investigated regarding hydrophobicity, contact angle, light- and temperature distribution. Further, the results were successfully simulated. Finally, the growth of Nostoc sp. was investigated on different surfaces and the biofilm thickness was measured noninvasively using optical coherence tomography. It could be shown that the cultivation surface had no influence on biomass production, but did affect biofilm thickness
    corecore